Strengthen Learning Tolerance for Weakly Supervised Object Localization

Abstract

Weakly supervised object localization (WSOL) aims at learning to localize objects of interest by only using the image-level labels as the supervision. While numerous efforts have been made in this field, recent approaches still suffer from two challenges:

1. Part Domination Issue: The localizer is prone to the local discriminative object regions rather than the desired whole object.

2. Learning Robustness Issue: The localizer is over-sensitive to the variations of the input images so that one can hardly obtain localization results robust to the arbitrary visual stimulus.

Methodology

To solve these issues, we propose a novel framework to strengthen the learning tolerance, referred to as SLT-Net, for WSOL. Specifically, we consider two-fold learning tolerance strengthening mechanisms:

1. Semantic Tolerance Strengthening Mechanism: Allows the localizer to make mistakes for classifying similar semantics so that it will not concentrate too much on the discriminative local regions. This helps the model focus on the entire object rather than just the most discriminative parts.

2. Visual Stimuli Tolerance Strengthening Mechanism: Enforces the localizer to be robust to different image transformations so that the prediction quality will not be sensitive to each specific input image. This improves the model's generalization across various visual inputs.

These two mechanisms work synergistically to address both the part domination and learning robustness issues, enabling the model to produce more complete and stable object localizations.

Experimental Results

Finally, we implement comprehensive experimental comparisons on two widely-used datasets:

• CUB-200-2011 (Caltech-UCSD Birds dataset)

• ILSVRC2012 (ImageNet Large Scale Visual Recognition Challenge)

The results demonstrate the effectiveness of our proposed approach. SLT-Net achieves superior performance compared to existing state-of-the-art methods, showing significant improvements in:

• Localization accuracy across different object categories

• Robustness to various image transformations and visual variations

• Ability to capture complete object regions rather than just discriminative parts

Keywords: Deep Learning Strengthen Learning Tolerance Object Localization Computer vision weakly supervised Instance Segmentation

Strengthen Learning Tolerance for Weakly Supervised Object Localization

Abstract

Methodology

Experimental Results

📚 Cite This Work